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Abstract 

This  paper  presents  a  novel  method  for  detecting 
multiple  moving  targets  in  real-time  from  infrared  (IR) 
image  sequences  collected  by  an  airborne  IR  camera. 
This  novel  method  is  based  on  dynamic  Gabor  filter 
and  dynamic  Gaussian  detector.  First,  the  ego-motion 
induced  by  the  airborne  platform  is  modeled  by 
parametric  affine  transformation  based  on  feature 
point  matching,  and  the  IR  video  is  stabilized  by 
eliminating  the  background  motion.  Then,  a  dynamic 
Gabor  filter  is  employed  to  enhance  the  image  changes 
for  accurate  detection  and  localization  of  moving 
targets.  The  orientation  of  Gabor  filter  is  dynamically 
changed  according  to  the  orientation  of  optical  flows. 
Next,  the  specular  highlights  generated  by  the  dynamic 
Gabor  filter  are  detected.  The  outliers  and  specular 
highlights  are  fused  to  indentify  the  moving  targets. 
The  experimental  results  show  that  the  proposed 
detection  algorithm  is  effective  and  efficient.  And  the 
detection  speed  is  approximate  2  frames  per  second. 

1.  Introduction 

Detection  of  moving  targets  in  infrared  (IR) 
imagery  is  a  challenging  research  topic  in  computer 
vision.  Detecting  and  localizing  a  moving  target 
accurately  is  important  for  automatic  tracking  system 
initialization  and  recovery  from  tracking  failure. 
Although  many  methods  have  been  developed  on 
detecting  and  tracking  targets  in  visual  images 
(generated  by  daytime  cameras),  there  exits  limited 
amount  of  work  on  target  detection  and  tracking  from 
IR  imagery  in  computer  vision  community  [I].  In 
comparison  to  the  visual  images,  the  images  obtained 
from  an  IR  camera  have  extremely  low  signal  to  noise 
ratio,  which  results  in  limited  information  for 


performing  detection  and  tracking  tasks.  In  addition,  in 
airborne  IR  images,  non-repeatability  of  the  target 
signature,  competing  background  clutter,  lack  of  a 
priori  information,  high  ego-motion  of  the  sensor,  and 
the  artifacts  due  to  weather  conditions  make  detection 
or  tracking  of  targets  even  harder.  To  overcome  the 
shortcomings  of  the  nature  of  IR  imagery,  different 
approaches  impose  different  constrains  to  provide 
solutions  for  a  limited  number  of  situations.  For 
instance,  several  detection  methods  require  that  the 
targets  are  hot  spots  which  appear  as  bright  regions  in 
the  IR  images  [2]  [3]  [4].  However,  in  realistic  target 
detection  scenarios,  none  of  these  assumptions  are 
applicable,  and  a  robust  detection  method  must 
successfully  deal  with  these  problems. 

This  paper  presents  an  approach  for  robust  real-time 
target  detection  in  airborne  IR  imagery.  This  approach 
has  the  following  characteristics:  (1)  it  is  robust  in 
presence  of  high  global  motion  and  significant  texture 
in  background,  (2)  it  does  not  require  that  targets  have 
constant  velocity  or  acceleration,  (3)  it  does  not 
assume  that  target  features  do  not  drastically  change 
over  the  course  of  tracking.  The  main  contribution  of 
this  paper  is  the  complete  algorithm  presented.  There 
are  two  foci  in  this  algorithm.  The  first  one  is  the 
dynamic  Gabor  filter,  where  the  orientation  of  Gabor 
filter  is  controlled  by  the  orientation  of  the  optic  flows. 
The  second  one  is  dynamic  Gaussian  detector,  which 
is  used  to  identify  the  target  location.  The  following 
shows  the  algorithm  in  detail. 

2.  Algorithm  description 

This  algorithm  can  be  formulated  in  four  steps:  (i) 
motion  compensation,  (ii)  dynamic  Gabor  filtering, 
(iii)  specular  highlights  detection,  and  (iv)  target 
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localization.  The  following  will  describe  these 
processing  steps  in  detail. 


(g)  (h) 

Fig.  1  (a)  and  (h)  Two  input  images;  (c)  Detected  optical 
flows;  (d)  Image  changes;  (e)  Partially  enlarge  of  (c)  to 
show  the  outliers  and  inliers;  (f)  Dynamic  Gahor  filter 
response;  (g)  Specular  highlights;  (h)  Clusters  of  specular 
highlights. 

2.1.  Motion  compensation 

The  motion  compensation  contains  the  feature 
point  extraction,  optical  flow  detection,  global 
parametric  motion  model  estimation,  and  motion 
detection. 

A.  Feature  point  detection.  The  feature  point 
extraction  is  used  as  the  first  step  of  this  algorithm. 
Harris  comer  detector,  Shi-Tomasi’s  comer  detector, 
SUSAN,  SIFT,  SURF,  and  FAST  are  some 
representative  feature  point  detection  algorithms 


developed  over  past  two  decades.  We  evaluated  these 
algorithms  according  to  two  criteria,  processing  time 
and  detection  accuracy.  Our  experiment  results  show 
that  Shi-Tomasi’s  method  is  more  reliable  than  others, 
and  is  pretty  fast.  Therefore,  this  work  employs  Shi- 
Tomasi’s  method  for  feature  point  detection.  For  two 
input  images,  P  and  IqX  =  {  p{  ,  }, 

and  ={  p{  ,  ...,  p^j^  }  denote  the  feature  points 
detected  from  and  ,  respectively,  where 

t'  =  t-^.Pi=(4 ^yil  1.2,  ..., 

M  and  y  =  1,  2,  ...,  A.  In  the  following,  is  called 
previous  image,  is  called  current  image  or  reference 
image. 

B.  Optical  flow  detection.  There  are  many  optical  flow 
detection  algorithms.  Recently  there  are  several  new 
developments  on  this  topic.  The  evaluation  results  of 
these  algorithms  in  [5]  show  that  Bouguet’s  method  [6] 
is  the  best  for  the  interpolation  task.  In  our  algorithm, 
we  employed  Bouguet’s  method  for  optical  flow 

detection.  Let  }  denote  the 

detected  optical  flows.  For  the  feature  points  in  set  P^ 
and  P^ ,  from  which  no  optical  flow  is  detected,  they 
are  filtered  out.  Therefore,  after  this  filtering  operation, 
the  number  of  feature  points  in  two  sets,  P^  and  P\ 
becomes  the  same  with  the  number  of  optical  flows  in 
optical  flow  set  ,  that  is,  K.  Fig.  1  (a)  and  (b)  show 
two  input  images,  and  ,  where  A  is  set  at  3.  Fig. 

1  (c)  shows  the  optical  flows  detected  from  the  feature 
points,  where  the  optical  flow  are  marked  by  red  line 
segments  and  the  endpoints  of  the  optical  flows  are 
marked  by  green  dots  (refer  to  (e)  for  the  partially 
enlarged  picture). 

C.  Motion  model  estimation.  Generally,  the  approach 
to  find  the  coordinate  transformation  relies  on 
assuming  that  it  will  take  one  of  the  following  six 
models,  (i)  translation,  (ii)  affine,  (ii)  bilinear,  (iv) 
projective,  (v)  pseudo  perspective,  and  (vi)  biquadratic, 
and  then  estimating  the  two  to  twelve  parameters  in  the 
chosen  models.  The  translation  model  is  not  applicable 
to  the  problem  that  contains  rotation.  The  complicated 
models  such  as  projective  and  biquadratic  are 
computationally  heavy,  and  parameter  estimation  are 
difficult.  Here  we  tested  affine,  bilinear,  and  pseudo 
perspective  model  by  adding  some  error  to  a  parameter 
and  checking  how  the  image  is  distorted.  The 
experiment  results  show  that  affine  model  is  robust  to 
parameter  estimation  error.  Therefore,  our  method  use 
affine  model.  Six  parameters  in  affine  model  are 
estimated  by  using  feature  points. 
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(1)  Feature  points  are  separated  into  two  categories, 
inliers  set  and  ,  and  outliers  set  and  P^^^^ . 
The  feature  points  associated  with  the  moving  targets 
are  called  outliers.  Those  associated  with  the 
background  are  called  inliers. 

(2)  Outliers  are  clustered  by  distance-based  clustering 
algorithm,  which  will  be  used  for  target  identification. 
Inliers  are  used  to  estimate  the  affine  model  by 
employing  RANS AC-like  algorithm.  Let^^  denote  the 
estimated  affine  model. 


D.  Motion  image  generation.  Here,  in  airborne 
imagery,  the  motion  image  means  changes  caused  by 
the  moving  targets.  The  previous  image  is  transformed 
by  the  affine  model  Ajj,  and  subtract  from  the  current 
image.  Fig.  1  (d)  shows  the  motion  image  generated  by 
Eq.  (1)  from  two  input  images  in  Fig.  1  (a)  and  (b). 

2.2.  Dynamic  Gabor  filter 

A  Gabor  wavelet  is  defined  as. 


2o-^ 


(1) 


where  z  =  (x,  y)  is  the  point  with  the  horizontal 
coordinate  x  and  the  vertical  coordinate  y.  The 
parameters  //  and  v  define  the  orientation  and  scale  of 

the  Gabor  kernel,  ||  •  ||  denotes  the  norm  operator,  and 

a  is  related  to  the  standard  derivation  of  the  Gaussian 
window  in  the  kernel  and  determines  the  ratio  of  the 
Gaussian  window  width  to  the  wavelength.  The  wave 
vector  y  is  defined  as  follows 

(2) 

where  =  ^max//*^  and  =  ;r///8  ,  is  the 

maximum  frequency,  and  is  the  spatial  frequency 

between  kernels  in  frequency  domain. 

In  our  algorithm,  we  fix  the  following  parameters, 

^max  =  ^/2,  cr  =  2;r,  /  =  s[2  ,  and  v  =3.  The 
orientation  //  is  dynamically  changed  according  to 
optical  flows  from  inliers.  We  call  it  dynamic  Gabor 
filter.  The  orientation  ju  is  defined  as. 


=  (3) 

^in 

where  9{FI^)  is  the  orientation  of  the  optical  flow 
F!  ^  G  F^J  ,  and  is  given  by. 


t'  t 

d{Fp )  =  arctan  ^^^7 — .  (4) 

Fig.  1  (f)  shows  the  Gabor  filter  response  by 
performing  convolution  for  the  frame  difference  image 
in  Fig.  1  (d)  and  the  dynamic  Gabor  kernel.  Dynamic 
Gabor  filter  enhanced  the  frame  difference. 


Frame  44  Frame  50 


Frame  53  Frame  73 


Fig.  2  Target  deteetion  results  in  frame  44,  50,  53,  and  73. 
Green  eireles  mark  the  ground  truth  target  positions, 
labeled  manually.  Red  eireles  means  targets  deteeted  based 
on  outliers  elustering  and  speeular  highlights.  Purple  eireles 
mark  the  output  of  the  dynamie  Gaussian  deteetor. 

2.3.  Specular  highlights  detection 

As  can  be  seen  in  Fig.  1  (f),  the  image  changes 
appear  as  high  intensity  in  the  dynamic  Gabor  filter 
response.  We  call  these  high  intensity  specular 
highlights.  Therefore,  the  target  detection  problem 
becomes  the  specular  highlights  detection  problem. 
Because  the  intensity  of  highlights  changes  for  the 
moving  targets  (some  specular  highlights  are  dimmer 
than  others),  the  thresholding  algorithms  cannot  detect 
all  specular  highlights  successfully.  Here,  we  employ 
the  pixel  intensity  on  the  circular  circles  Ci,  C2,  and  C3, 
centered  at  Cq,  where  Cq  is  the  pixel  under  examination. 
The  detector  compares  the  intensity  at  Cq  and  the 
intensity  of  pixels  on  three  circular  circles  Ci,  C2,  and 
C3,  with  radius  Ri,  R2,  and  R^,  respectively.  Ci,  C2,  and 
C3  are  sampled  at  ;r/6  interval,  hence  the  detector 
will  only  compare  the  intensity  at  Cq  and  12  sample 
points,  Cyi,  Cy2,  Cy  i2,  from  each  circular  circle.  Let 
G(z)  denote  the  dynamic  Gabor  filter  response  at  z,  the 
discrimination  of  specular  highlights  is  as  follows.  If 
G(Co)>G(Cj  i)  and  G(Cj  i)  >  G(Cj+ij),  Cq  is  a  specular 
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highlight,  otherwise  not,  where  j  =  1,2,  and  i=  1,2,  . . ., 

12. 

The  specular  highlight  points  detected  from  the 
dynamic  Gabor  filter  response  in  Fig.  1  (f)  are  shown 
in  Fig.  1  (g)  by  red  dots.  The  specular  highlight 
clustering  results  are  sown  in  Fig.  1  (h). 

2.4.  Moving  target  localization 

Outliers  and  specular  highlights  are  used  to 
localize  the  moving  targets.  The  identification  process 
is  as  follows. 

(i)  For  a  specular  highlight,  if  its  center  lies  in  the 
terrain  of  a  outlier  cluster,  it  is  considered  as  a 
target.  If  its  center  does  not  lie  in  any  cluster,  the 
dynamic  Gaussian  detector  is  employed. 

(ii)  For  the  general  2-D  Gaussian  function,  its 
orientation  is  controlled  by  the  orientation  of  the 
specular  highlight.  Here  we  call  it  dynamic 
Gaussian  detector.  LACC  is  used  as  to  similarity 
measure.  If  LACC  is  larger  than  threshold  Tq,  this 
specular  highlight  is  considered  as  a  moving  target. 

3.  Experiment  results 

The  entire  algorithm  described  in  Section  3  is 
implemented  by  using  C++  and  OpenCV  on  windows 
platform.  A  is  set  at  2,  the  similarity  threshold  Tq  at 
0.93,  and  A,  cr^,  and  cr^  are  set  at  1,  25.0,  and  15.0, 

respectively.  The  radii,  Ri,  R2,  and  R^,  of  three  circles 
for  specular  highlights  detection  is  7,  10,  and  13, 
respectively.  The  IR  video  data  from  the  Vivid  datasets, 
provided  by  the  Air  Force  Research  Laboratory,  is 
used.  Fig.  2  shows  target  detection  results  at  frame  44, 
50,  53  and  73  for  an  input  image  sequence.  Green 
circles  mark  the  ground  truth  target  positions,  labeled 
manually,  red  circles  means  targets  detected  based  on 
outlier  clustering  and  specular  highlights,  and  purple 
circles  marks  the  output  of  the  dynamic  Gaussian 
detector. 

4.  Performance  analysis 

To  evaluate  the  performance  of  this  algorithm,  we 
selected  four  image  sequences  with  the  significant 
background  as  the  test  data.  Each  sequence  contains 
100  frames,  and  each  frame  contains  2  to  four  moving 
targets.  The  ground  truth  targets  are  labeled  manually. 
The  total  number  of  targets  in  these  4  datasets  is  1231. 
We  examined  the  correct  detection  rate,  hit  rate,  and 
processing  time.  The  hit  rate  is  defined  as  the  ratio  for 
the  intersected  area  of  detected  target  and  ground  truth 
target  and  the  area  of  the  ground  truth  target.  The 
experiments  are  conducted  on  a  Windows  Vista 
machine  mounted  with  a  2.33  GHz  Intel  Core  2  CPU 


and  2GB  main  memory.  The  total  average  correct 
detection  rate  is  86.6%,  and  hit  rate  is  78.6%, 
respectively.  The  detail  detection  results  are  shown  in 
Table  1.  The  average  processing  time  is  581  ms/frame. 


Table  1.  Target  detection  results. 


Data  1 

Data  2 

Data  3 

Data  4 

Total  targets 

381 

266 

287 

297 

Detected  targets 

326 

221 

249 

270 

Missed  targets 

55 

45 

38 

27 

Correct  detection 

85.6% 

83.1% 

86.8% 

90.9% 

Miss  detection 

14.4% 

16.9% 

13.2% 

9.1% 

Hit  rate 

85.9% 

81.3% 

70.7% 

76.6% 

5.  Conclusions  and  future  work 

This  paper  described  a  method  for  multiple  moving 
target  detection  from  airborne  IR  imagery.  We  tested 
the  algorithm  by  using  the  airborne  IR  videos  from 
AFRL  Vivid  datasets.  The  correct  detection  rate  is 
86.6%,  and  the  hit  rate  for  the  correct  detection  is 
78.6%.  The  processing  rate  is  581  ms/frame,  that  is, 
approximate  2  frames  per  second.  This  speed  meets  the 
requirement  for  many  real-time  target  detection  and 
tracking  systems.  The  future  work  is  to  apply  this 
algorithm  to  the  target  tracking  systems. 
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